xarray dask
#dask
Tips
Understanding how to use Dask Local Cluster with Xarray - HPC - Pangeo -
Video
Dask and Xarray - YouTube -
chunk
For small datasets (<100 MB), chunking may not provide significant benefits.
For medium-sized datasets (100 MB - 1 GB), consider chunk sizes in the range of 100 MB - 500 MB.
For large datasets (>1 GB), chunk sizes can range from 500 MB to several GB, depending on available memory and access patterns.
Chunking for your access pattern, not for “nice round numbers”
For ocean data, you typically have one of two dominant read patterns:
ong time series at a point/region (moorings, indices, transport time series)
big maps at a few times (snapshots, eddy fields, composites)
The key is to chunk so that your most common slice touches as few chunks as possible. Also, layout strongly affects performance; a 2026 performance study emphasizes that chunk size/layout and access pattern (time series vs spatial frame) materially change read times.
Rule-of-thumb patterns:
code:python
# time-series friendly (few spatial points, many times)
chunks = {"time": 256, "y": 64, "x": 64}
# map-friendly (few times, big maps)
chunks = {"time": 1, "y": 1024, "x": 1024}
An elegant way to guarantee single chunk along dim
code: python
x = x.chunk({"time": x.sizes"time"})
Dask performs better when operations are fused.
Bad (many small tasks):
code:python
tmp = ds"u" * ds"v"
tmp2 = tmp + ds"w"
out = tmp2.mean("time")
Better:
code:python
out = (ds"u" * ds"v" + ds"w").mean("time")
This reduces graph size and improves execution efficiency.
Run feature extraction “per chunk” with map_blocks (eddies/fronts/MHW masks, etc.)
This pattern lets you apply the same diagnostic to each Dask chunk, which is useful for scalable feature fields (threshold masks, pre-labeling fields, EKE, gradients, etc.).
code:python
import xarray as xr
import numpy as np
def eke_block(ds):
# Example: EKE = 0.5*(u'^2 + v'^2); here u',v' are anomalies from the time-mean
up = ds"u" - ds"u".mean("time")
vp = ds"v" - ds"v".mean("time")
eke = 0.5 * (up2 + vp2)
return xr.Dataset({"eke": eke})
#Chunk input along time and space
ds = ds.chunk({"time": 30, "x": 400, "y": 400})
eke = xr.map_blocks(
eke_block,
ds"u", "v",
template=xr.Dataset({"eke": ds"u"}), # must match output dims/type
)
Key points: always provide a correct template; best suited to computations with fixed-shaped outputs (not variable-length “object lists”).
Do chunk design once, then reuse: “rechunk → persist” (or write an intermediate store)
For iterative workflows (composites, filtering, EOFs, regressions), performance is often dominated by chunk layout. A good layout + caching/persistence pays off quickly.
code:python
from dask.distributed import Client
client = Client()
# Example: if you frequently reduce over time, use larger time chunks
ds = ds.chunk({"time": 90, "y": 500, "x": 500})
# Expensive preprocessing done once; reuse many times
anom = (ds"sst" - ds"sst".groupby("time.dayofyear").mean("time")).persist()
# If memory is tight, write the intermediate to Zarr instead of persisting
# anom.to_dataset(name="sst_anom").to_zarr("sst_anom.zarr", mode="w", consolidated=True)
Rule of thumb: avoid extremely tiny chunks that explode task counts; choose chunks around your dominant reduction dimension(s).
---
Program to test chunk size
code:python
import numpy as np
import xarray as xr
import time
def time_chunked_read(dataset, chunk_size):
start_time = time.time()
dataset.chunk(chunk_size).load()
end_time = time.time()
read_time = end_time - start_time
return read_time
def find_optimal_chunk_size(dataset, chunk_size_range):
read_times = []
for chunk_size in chunk_size_range:
read_time = time_chunked_read(dataset.copy(), chunk_size)
read_times.append((chunk_size, read_time))
# Sort the results by read time
read_times.sort(key=lambda x: x1)
# Select the chunk size with the lowest read time
optimal_chunk_size, optimal_read_time = read_times0
return optimal_chunk_size, optimal_read_time
if __name__ == "__main__":
# Load the NetCDF dataset
dataset = xr.open_dataset('data.nc')
# Define the range of chunk sizes to test
chunk_size_range = range(1000000, 10000000, 1000000)
# Find the optimal chunk size
optimal_chunk_size, optimal_read_time = find_optimal_chunk_size(dataset, chunk_size_range)
print("Optimal chunk size:", optimal_chunk_size)
print("Optimal read time:", optimal_read_time)
code:python
import xarray as xr
import time
def benchmark_chunk_size(dataset, chunk_size):
# Load the dataset with the specified chunk size
chunked_dataset = dataset.chunk(chunk_size)
# Perform a representative operation on the dataset to measure performance
start_time = time.time()
chunked_dataset.operation() # Replace with the actual operation you want to benchmark
end_time = time.time()
# Calculate the execution time
execution_time = end_time - start_time
return execution_time
def determine_optimal_chunk_size(dataset, chunk_sizes):
# Benchmark each chunk size and store the results
benchmark_results = []
for chunk_size in chunk_sizes:
execution_time = benchmark_chunk_size(dataset, chunk_size)
benchmark_results.append((chunk_size, execution_time))
# Identify the chunk size with the minimum execution time
optimal_chunk_size = min(benchmark_results, key=lambda x: x1)0
return optimal_chunk_size
if __name__ == "__main__":
# Load the NetCDF dataset
dataset = xr.load_dataset('dataset.nc')
# Define a range of chunk sizes to test
chunk_sizes = 100e6, 500e6, 1e9, 2e9, 5e9
# Determine the optimal chunk size
optimal_chunk_size = determine_optimal_chunk_size(dataset, chunk_sizes)
print("Optimal chunk size:", optimal_chunk_size)
xarray GroupBy.map
Improving GroupBy.map with Dask and Xarray — Coiled documentation -
Avoiding Dask graph blow-up with staged reductions
Large reductions (e.g., global integrals over long time series) can create huge task graphs.
Pattern: reduce in stages.
code:python
# Step 1: reduce spatially (keeps time dimension)
spatial_mean = ds"thetao".mean(("x","y"))
# Step 2: rechunk to favor time reduction
spatial_mean = spatial_mean.chunk({"time": -1})
# Step 3: reduce temporally
final = spatial_mean.mean("time")
This keeps task graphs manageable and improves scheduler throughput.
#xarray